shadow mode
Stepping Out of the Shadows: Reinforcement Learning in Shadow Mode
Gassert, Philipp, Althoff, Matthias
Reinforcement learning (RL) is not yet competitive for many cyber-physical systems, such as robotics, process automation, and power systems, as training on a system with physical components cannot be accelerated, and simulation models do not exist or suffer from a large simulation-to-reality gap. During the long training time, expensive equipment cannot be used and might even be damaged due to inappropriate actions of the reinforcement learning agent. Our novel approach addresses exactly this problem: We train the reinforcement agent in a so-called shadow mode with the assistance of an existing conventional controller, which does not have to be trained and instantaneously performs reasonably well. In shadow mode, the agent relies on the controller to provide action samples and guidance towards favourable states to learn the task, while simultaneously estimating for which states the learned agent will receive a higher reward than the conventional controller. The RL agent will then control the system for these states and all other regions remain under the control of the existing controller. Over time, the RL agent will take over for an increasing amount of states, while leaving control to the baseline, where it cannot surpass its performance. Thus, we keep regret during training low and improve the performance compared to only using conventional controllers or reinforcement learning. We present and evaluate two mechanisms for deciding whether to use the RL agent or the conventional controller. The usefulness of our approach is demonstrated for a reach-avoid task, for which we are able to effectively train an agent, where standard approaches fail.
Using CD with machine learning models to tackle fraud
Credit card fraudsters are always changing their behavior, developing new tactics. For banks, the damage isn't just financial; their reputations are also on the line. So how do banks stay ahead of the crooks? For many, detection algorithms are essential. Given enough data, a supervised machine learning model can learn to detect fraud in new credit card applications. This model will give each application a score -- typically between 0 and 1 -- to indicate the likelihood that it's fraudulent. The banks can then set a threshold for which they regard an application as fraudulent or not -- typically that threshold will enable the bank to keep false positives and false negatives at a level it finds acceptable. False positives are the genuine applications that have been mistaken as fraud; false negatives are the fraudulent applications that are missed.
- Banking & Finance (0.72)
- Law Enforcement & Public Safety > Fraud (0.67)
Tesla's new Autopilot will run in 'shadow mode' to prove that it's safer than human driving
Tesla today rolled out a next-generation version of its autonomous driving hardware suite, which the company says the tech should be able to autonomously drive a Tesla from LA to New York, dropping a rider off in Times Square and then going to park itself. But, before Teslas can start driving autonomously, the company needs to collect a lot of data to prove to customers (and regulators) that the technology is safe and reliable. So, the car will run Autopilot in "shadow mode" in order for Tesla to gather statistical data to show false positives and false negatives of the software. In shadow mode, the car isn't taking any action, but it registers when it would have taken action. Then, if the Tesla is in an accident, the company can see if the autonomous mode would have avoided the accident (or the other way around, with the self-driving system potentially causing an accident).
- Transportation > Ground > Road (0.59)
- Information Technology > Robotics & Automation (0.59)
- Automobiles & Trucks (0.59)